我们在JAX中写入了一个用于分布式矩阵分解的开源库,用于分布式矩阵分解。我们的设计允许通过缩放可用TPU核心的数量来有效地利用TPU架构并缩放到O(B)行/列的矩阵分解问题。为了使未来的大规模矩阵分解方法研究以及说明我们自己实现的可扩展性属性,我们还建立了一个名为WebGraph的真实网络链路预测数据集。该数据集可以很容易地建模为矩阵分解问题。我们基于子图的局部性和稀疏性属性创建了此数据集的多个变体。WebGraph的最大变体有大约365米的节点,并在大约20分钟内训练单个时期饰面,256个TPU核心。我们包括在所有WebGraph的所有变体上的ALX的速度和性能编号。框架代码和数据集都是开放的。
translated by 谷歌翻译
Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.
translated by 谷歌翻译
Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.
translated by 谷歌翻译
Achieving artificially intelligent-native wireless networks is necessary for the operation of future 6G applications such as the metaverse. Nonetheless, current communication schemes are, at heart, a mere reconstruction process that lacks reasoning. One key solution that enables evolving wireless communication to a human-like conversation is semantic communications. In this paper, a novel machine reasoning framework is proposed to pre-process and disentangle source data so as to make it semantic-ready. In particular, a novel contrastive learning framework is proposed, whereby instance and cluster discrimination are performed on the data. These two tasks enable increasing the cohesiveness between data points mapping to semantically similar content elements and disentangling data points of semantically different content elements. Subsequently, the semantic deep clusters formed are ranked according to their level of confidence. Deep semantic clusters of highest confidence are considered learnable, semantic-rich data, i.e., data that can be used to build a language in a semantic communications system. The least confident ones are considered, random, semantic-poor, and memorizable data that must be transmitted classically. Our simulation results showcase the superiority of our contrastive learning approach in terms of semantic impact and minimalism. In fact, the length of the semantic representation achieved is minimized by 57.22% compared to vanilla semantic communication systems, thus achieving minimalist semantic representations.
translated by 谷歌翻译
This work addresses the problems of (a) designing utilization measurements of trained artificial intelligence (AI) models and (b) explaining how training data are encoded in AI models based on those measurements. The problems are motivated by the lack of explainability of AI models in security and safety critical applications, such as the use of AI models for classification of traffic signs in self-driving cars. We approach the problems by introducing theoretical underpinnings of AI model utilization measurement and understanding patterns in utilization-based class encodings of traffic signs at the level of computation graphs (AI models), subgraphs, and graph nodes. Conceptually, utilization is defined at each graph node (computation unit) of an AI model based on the number and distribution of unique outputs in the space of all possible outputs (tensor-states). In this work, utilization measurements are extracted from AI models, which include poisoned and clean AI models. In contrast to clean AI models, the poisoned AI models were trained with traffic sign images containing systematic, physically realizable, traffic sign modifications (i.e., triggers) to change a correct class label to another label in a presence of such a trigger. We analyze class encodings of such clean and poisoned AI models, and conclude with implications for trojan injection and detection.
translated by 谷歌翻译
Aspect Based Sentiment Analysis is a dominant research area with potential applications in social media analytics, business, finance, and health. Prior works in this area are primarily based on supervised methods, with a few techniques using weak supervision limited to predicting a single aspect category per review sentence. In this paper, we present an extremely weakly supervised multi-label Aspect Category Sentiment Analysis framework which does not use any labelled data. We only rely on a single word per class as an initial indicative information. We further propose an automatic word selection technique to choose these seed categories and sentiment words. We explore unsupervised language model post-training to improve the overall performance, and propose a multi-label generator model to generate multiple aspect category-sentiment pairs per review sentence. Experiments conducted on four benchmark datasets showcase our method to outperform other weakly supervised baselines by a significant margin.
translated by 谷歌翻译
基于会话的推荐系统(SBRS)表现出优于常规方法的性能。但是,它们在大规模工业数据集上显示出有限的可伸缩性,因为大多数模型都会学习一个嵌入每个项目。这导致了巨大的记忆要求(每项存储一个矢量),并且在稀疏的会话上具有冷启动或不受欢迎的项目的性能差。使用一个公共和一个大型工业数据集,我们在实验上表明,最先进的SBRS在稀疏项目的稀疏会议上的性能较低。我们提出了M2TREC,这是一种基于会话建议的元数据感知的多任务变压器模型。我们提出的方法学习了从项目元数据到嵌入的转换函数,因此是免费的(即,不需要学习一个嵌入每个项目)。它集成了项目元数据以学习各种项目属性的共享表示。在推论期间,将为与先前在培训期间观察到的项目共享的属性分配新的或不受欢迎的项目,因此将与这些项目具有相似的表示,从而使甚至冷启动和稀疏项目的建议。此外,M2TREC接受了多任务设置的培训,以预测会话中的下一个项目及其主要类别和子类别。我们的多任务策略使该模型收敛更快,并显着改善了整体性能。实验结果表明,使用我们在两个数据集中稀疏项目上提出的方法进行了显着的性能增长。
translated by 谷歌翻译
本文考虑通过模型量化提高联邦学习(FL)的无线通信和计算效率。在提出的Bitwidth FL方案中,Edge设备将其本地FL模型参数的量化版本训练并传输到协调服务器,从而将它们汇总为量化的全局模型并同步设备。目的是共同确定用于本地FL模型量化的位宽度以及每次迭代中参与FL训练的设备集。该问题被视为一个优化问题,其目标是在每卷工具采样预算和延迟要求下最大程度地减少量化FL的训练损失。为了得出解决方案,进行分析表征,以显示有限的无线资源和诱导的量化误差如何影响所提出的FL方法的性能。分析结果表明,两个连续迭代之间的FL训练损失的改善取决于设备的选择和量化方案以及所学模型固有的几个参数。给定基于线性回归的这些模型属性的估计值,可以证明FL训练过程可以描述为马尔可夫决策过程(MDP),然后提出了基于模型的增强学习(RL)方法来优化动作的方法选择迭代。与无模型RL相比,这种基于模型的RL方法利用FL训练过程的派生数学表征来发现有效的设备选择和量化方案,而无需强加其他设备通信开销。仿真结果表明,与模型无RL方法和标准FL方法相比,提出的FL算法可以减少29%和63%的收敛时间。
translated by 谷歌翻译
诸如GPT-3之类的大型审慎模型通过利用自学学习的学习来学习明显的表现,从而对现代自然语言处理产生了巨大影响,这些表现可以轻易地对各种下游任务进行挑剔。我们通过使用微笑语言构建化学基础模型Chemberta-2来研究将这种进步转移到分子机器学习中的可能性。虽然标记的分子预测任务数据通常很少,但微笑字符串的库很容易获得。在这项工作中,我们通过优化预处理过程来建立Chemberta。我们比较了通过不同的超参数和预处理数据集尺寸的多任务和自我监督预训练的预测,来自PubChem最多77m化合物。据我们所知,77m集合构成了迄今为止用于分子预处理的最大数据集之一。我们发现,通过这些预处理的改进,我们与Moleculenet基准套件上现有的最先进的体系结构具有竞争力。我们分析了预读的改进的程度,转化为下游任务的改进。
translated by 谷歌翻译
计算幽默检测系统很少对幽默反应的主观性进行建模,或者考虑对幽默的替代反应 - 即犯罪。我们分析了不同年龄段的男性和女性注释者的大量幽默和犯罪评级数据集。我们发现女性比男性更强烈地联系这两个概念,她们倾向于给出较低的幽默评分和更高的进攻得分。我们还发现,幽默与犯罪之间的相关性随着年龄的增长而增加。尽管幽默发现没有性别或年龄差异,但女性和较旧的注释者表示,她们比男性更频繁地理解笑话文本。我们讨论对计算幽默检测和下游任务的影响。
translated by 谷歌翻译